Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributions cannot be freed by garbage collector due to self-references #13986

Closed
freddyaboulton opened this issue May 4, 2021 · 1 comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats

Comments

@freddyaboulton
Copy link

freddyaboulton commented May 4, 2021

I believe that continuous distributions in scipy.stats.distributions cannot be freed by the garbage collector due to self references in the rv_continuous base class implementation.

Reproducing code example:

Need to download objgraph

import gc
import scipy
from scipy.stats.distributions import uniform
import objgraph


def dump_garbage():
    gc.collect()

    assert [1] not in gc.garbage
    for x in gc.garbage:
        if isinstance(x, scipy.stats._continuous_distns.uniform_gen):
            objgraph.show_backrefs([x], filename='gen-backrefs.png')
        if isinstance(x, list) and x[0] == 'my-list':
            objgraph.show_backrefs([x], filename='leaky-list-backrefs.png')


def non_leaky_list():
    l = []
    l.append(1)
    del l
    dump_garbage()


def leaky_list():
    l = ['my-list']
    l.append(l)
    del l
    dump_garbage()


def make_uniform_dist():
    uni = uniform(loc=0, scale=1)
    del uni
    dump_garbage()


if __name__ == "__main__":
    gc.enable()
    gc.set_debug(gc.DEBUG_SAVEALL)
    print("Non Leaky List")
    non_leaky_list()
    print("Leaky list")
    leaky_list()
    print("Make uniform dist")
    make_uniform_dist(

This will produce two plots, one that shows the references keeping the leaky list in memory and one that shows the references keeping the uniform distribution in memory despite both being deleted. As a sanity check, I assert that the non-leaky list is never present in gc.garbage.

Leaky list back references

To also sanity check the objgraph output, I display the references keeping the leaky list in memory. There's a self-reference displayed, which makes sense.

leaky-list-backrefs

Uniform distribution back references

We see that the np.vectorize'd methods (self._ppf_single_call, self._entropy, self._cdf_single_call, self._mom1_sc) and the _parse_args methods have references to the uniform distribution, despite being created by the uniform distribution!

I found that surprising! I wonder if the way the np.vectorize calls and the _parse_args methods can be refactored to avoid self-references. I don' think it's ideal that these distributions will never be released from memory by the garbage collector.

gen-backrefs

Scipy/Numpy/Python version information:

1.6.3 1.20.2 sys.version_info(major=3, minor=8, micro=8, releaselevel='final', serial=0)
@ev-br ev-br added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats labels May 5, 2021
@ev-br
Copy link
Member

ev-br commented May 5, 2021

These involve fairly hairy hacks, unfortunately. If you see a way of refactoring these to avoid circular references and keep the backwards compatibility, a PR is very welcome.

@mdhaber mdhaber closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

No branches or pull requests

3 participants